Clustering metagenomic sequences with interpolated Markov models
نویسندگان
چکیده
منابع مشابه
Clustering Sequences with Hidden Markov Models
This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs) . The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number...
متن کاملInterpolated Markov models for eukaryotic gene finding.
Computational gene finding research has emphasized the development of gene finders for bacterial and human DNA. This has left genome projects for some small eukaryotes without a system that addresses their needs. This paper reports on a new system, GlimmerM, that was developed to find genes in the malaria parasite Plasmodium falciparum. Because the gene density in P. falciparum is relatively hi...
متن کاملMicrobial gene identification using interpolated Markov models.
This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae , Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H.pylori and H. influenzae ...
متن کاملUnsupervised Two-Way Clustering of Metagenomic Sequences
A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes mul...
متن کاملSimilarity-Based Clustering of Sequences Using Hidden Markov Models
Hidden Markov models constitute a widely employed tool for sequential data modelling; nevertheless, their use in the clustering context has been poorly investigated. In this paper a novel scheme for HMMbased sequential data clustering is proposed, inspired on the similaritybased paradigm recently introduced in the supervised learning context. With this approach, a new representation space is bu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Bioinformatics
سال: 2010
ISSN: 1471-2105
DOI: 10.1186/1471-2105-11-544